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Abstract 



Invariant mass spectra for jets reconstructed using the anti-A;T and Cambridge- 
Aachen algorithms are studied for different jet "grooming" techniques in data cor- 
responding to an integrated luminosity of 5fb^^, recorded with the CMS detector in 
proton-proton collisions at the LHC at a center-of-mass energy of 7TeV. Leading- 
order QCD predictions for inclusive dijet and W/Z+jet production combined with 
parton-shower Monte Carlo models are found to agree overall with the data, and the 
agreement improves with the implementation of jet grooming methods used to dis- 
tinguish merged jets of large transverse momentum from softer QCD gluon radiation. 



Submitted to the Journal of High Energy Physics 



© 2013 CERN for the benefit of the CMS Collaboration. CC-BY-3.0 license 



*See Appendix|A]for the list of collaboration members 



1 



1 Introduction 

The variables most often used in analyses of jet production are jet directions and transverse 
momenta (pj). However, as jets are composite objects, their invariant masses (nij) provide 
additional information that can be used to characterize their properties. One motivation for 
investigating jet mass is that, at the Large Hadron Collider (LHC), massive standard model 
(SM) particles such as W and Z bosons and top quarks are often produced with large Lorentz 
boosts, and, when such particles decay into quarks, the masses of the evolved jets can be used to 
discriminate them from lighter objects generated in quantum-chromodynamic (QCD) radiative 
processes. The same argument also holds for any new massive particles produced at the LHC. 
For sufficiently large boosts, all the decay products tend to be emitted as collimated groupings 
into small sections of the detector, and the resulting particles can be clustered into a single jet. 
Jet "grooming" techniques are designed to separate such merged jets from background. These 
new techniques have been found to be very promising for identifying decays of highly-boosted 
W bosons and top quarks, and in searches for Higgs bosons and other massive particles [IJ. The 
main advantage of these grooming techniques is their ability to distinguish high pj jets that 
arise from decays of massive, possibly new, particles. In addition, their robust performance 
is valuable in the presence of additional interactions in an event (pileup), which is likely to 
provide an even greater challenge to such analyses in future higher-luminosity runs at the 
LHC. 

Only a few of these promising approaches have been studied in data at the Tevatron (21 or at the 
LHC f3]. To understand these techniques in the context of searches for new phenomena, the jet 
mass must be well-modeled through leading-order (LO) or next-to-leading-order (NLO) Monte 
Carlo (MC) simulations. Much recent theoretical work in QCD has focused on the computa- 
tion of jet mass, including predictions using advances in an effective field theory of jets (soft 
collinear effective theory, SCET) [4-53]. Studies of the kind reported in the present analysis can 
provide an understanding of the extent to which MC simulations that match matrix-element 
partons with parton showers can model the observed internal jet structure. Results of these 
studies can also be used to compare data with theoretical computations of jet mass, and to pro- 
vide benchmarks for the use of these algorithms in searches for highly-boosted Higgs bosons, 
or new objects beyond the SM, especially by investigating some of the background processes 
expected in such analyses. 

We present a measurement of jet mass in a sample of dijet events, and the first study of such 
distributions in V+jet events, where V refers to a W or Z boson. The data correspond to an 
integrated luminosity of 5.0 ± 0.2 fb^^, collected by the Compact Muon Solenoid (CMS) ex- 
periment at the LHC in pp interactions at a center-of-mass energy of 7TeV. The analysis of 
these two types of final states provides complementary information because of their different 
parton-flavor content, since the selected dijet events are dominated by gluon-initiated jets, and 
the V+jet events often contain quark-initiated jets. We focus on measuring the jet mass af- 
ter applying several jet grooming techniques involving "filtering" [24J, "trimming" [25 1, and 
"pruning" 1.26, 27.| of jets, as discussed in detail below. This work also presents the first attempt 
to measure the mass of trimmed and pruned jets. 

To study the dependence of the differential distributions in nij on jet pj, we measure the distri- 
butions in intervals of jet transverse momentum. Formally, this can be expressed in terms of a 
double-differential cross section for jet production {d^cr/ dpjdmj) that is examined as a function 
of nij for several nonoverlapping intervals in pj: 
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where / = 1,2,3,. . . refers to the interval in pj, and the sum of contributions over all / is 
equal to the total observed cross section Yli o'i = c^- The differential probability density as a 
function of mj for each pj interval can therefore be written as 



Pi{m^) = ^ X with j Pi{mj) dnij = 1. (2) 

The distributions in reconstructed jet mass of Eq. (|2| include corrections used to unfold jets to 
the "particle" level; the pj intervals are defined for ungroomed jets, following energy correc- 
tions for the response of the detector. 

For the dijet analysis, pj and nij correspond to the average transverse momentum and average 
jet mass of the two leading jets (i.e., of highest pj): p^^ = (pxi + Pti)/^ and m^^*^ = (mj^ + 
mj2)/2. For the V+jet analysis, we use the nij and pj of the leading jet. Both quantities depend 
on the nature of the jet grooming algorithm, as discussed in Section|2] 

This paper is organized as follows. To introduce the subject, we first discuss jet clustering 
algorithms in Section |2j focusing mainly on grooming techniques. After a brief description of 
the CMS detector and the MC samples in Section |3| we provide information pertaining to the 
collected data and a description of event reconstruction in Section |4] Selection of events is then 
described in Section [Sj and the effect of pileup on jet mass is investigated in Section |6] This 
is followed in SectionTzlby the correction and unfolding procedures that are applied to the nij 
spectra and their corresponding systematic uncertainties. In Sections |8] and |9| we present the 
results of the dijet and V+jet analyses, respectively. Finally, observations and remarks on the 



presented results are summarized in Section 10 



The distributions shown are also stored in HEPData format 



2 Jet clustering algorithms and grooming techniques 
2.1 Sequential jet clustering algorithms 

Jets are defined through sequential, iterative jet clustering algorithms that combine four-vectors 
of input pairs of particles until certain criteria are satisfied and jets are formed. For the jet algo- 
rithms considered in this paper, for each pair of particles i and /, a "distance" metric between 
the two particles (dy), and the so-called "beam distance" for each particle (d,B), are computed: 



dij = mm{pj^",pjj")ARjj/R^ (3) 
diB = PT?", (4) 

where pj- and pxy are the transverse momenta of particles i and /, respectively, "min" refers 
to the lesser of the two pj values, the integer n depends on the specific jet algorithm, ARjj = 

^ {AyijY + {AcpijY is the distance between i and ; in rapidity {y = \ \n{E + p^) / {E — p^)) and 
azimuth {(p), and R is the "size" parameter of order unity 1129 1 , with all angles expressed in 
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radians. The particle pair (/,;) with smallest is combined into a single object. All distances 
are recalculated using the new object, and the procedure is repeated until, for a given object /, 
all the dij are greater than dis- Object i is then classified as a jet and not considered further in 
the algorithm. The process is repeated until all input particles are clustered into jets. 

The value for n in Eqs. (|3| and Q governs the topological properties of the jets. For n = 1 the 
procedure is referred to as the kj algorithm (KT). The KT jets tend to have irregular shapes and 
are especially useful for reconstructing jets of lower momentum | 29|| . For this reason, they are 
also sensitive to the presence of low-px pileup (PU) contributions, and are used to compute the 
mean pj per unit area (in (y, (p)) of an event [30|. For n = —1, the procedure is called the anti-fcj 
(AK) algorithm, with features close to an idealized cone algorithm. The AK algorithm is used 
extensively in LHC experiments and by the theoretical community for finding well-separated 
jets [29J. For n = 0, the procedure is called the Cambridge- Aachen (CA) algorithm. This relies 
only on angular information, and, like the kj algorithm, provides irregularly-shaped jets in 
{y, (p). The CA algorithm is useful in identifying jet substructure Il3lti33| . 

Jet grooming techniques [26] that reduce the impact of contributions from the underlying event 
(UE), PU, and low-px gluon radiation can be useful irrespective of the specific nature of anal- 
ysis. These kinds of contributions to jets are typically soft and diffuse, and hence contribute 
energy to the jet proportional to the area |30|. Because grooming techniques reduce the areas of 
jets without affecting the core components, the resulting jets are less sensitive to contributions 
from UE and PU, while still reflecting the kinematics of the hard original process. We consider 
three forms of grooming, referred to as filtering, trimming, and pruning. Such techniques can 
be applied to jets clustered through different algorithms (KT, AK, or CA). For the dijet analy- 
sis, we choose to cluster jets with the anti-Zcj algorithm with R = 0.7 (AK7), as these are used 
extensively at CMS. For the V+jet analysis, in addition to AK7 jets, we also study CA jets with 
R = 0.8 (CAS), considered in recent publications involving top-quark tagging fSH, and with 
R = 1.2 (CA12), which was proposed for analyses involving highly-boosted objects ||24| . After 
the initial jet clustering with AK7, CAS, or CA12, the constituents of those jets are reclustered 
with a (possibly different) jet algorithm (e.g., KT, CA, or AK), applying additional grooming 
conditions to the sequence of selection criteria used for clustering. The optimal choice of this 
secondary clustering algorithm depends on the grooming technique, as described below. For 
the techniques we have investigated, the parameters chosen for the algorithms correspond to 
those chosen by Refs. ||24ti27| , nevertheless specific optimization would appear to be advisable 
for all well-defined searches for new phenomena. 

2.2 Filtering algorithm 

The "mass-drop /filtering" procedure aims to identify symmetric splitting of jets of large pj 
that have large mj values. It was proposed initially for use in searches for the Higgs boson ||24| , 
but we consider just the filtering aspects of this algorithm for grooming jets. 

For each jet obtained in the initial clustering procedure, the filtering algorithm defines a new, 
groomed jet through the following algorithm: (i) the constituents of each jet are reclustered 
using the CA algorithm with R = 0.3, thereby defining n new subjets Si, . . . , s„, ordered in 
descending pj, and (ii) the four-momentum of the new jet is defined by the four-vector sum 
over the three subjets of hardest pj, or in the rare case that n < 3, just these remaining subjets 
define the new jet. 



The new jet has fewer particles than the initial jet, thereby reducing the contribution from ef- 
fects such as underlying event and pileup, and the new nij and pj values are therefore smaller 
than those of the initial jet. As will be demonstrated in Section 2.5 with this choice of param- 
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eters, filtering removes the fewest jet constituents, and is therefore the least aggressive of the 
investigated jet grooming techniques. 

2.3 Trimming algoritlim 

Trimming ignores particles within a jet that fall below a dynamic threshold in pj [25J. It reclus- 
ters the jet's constituents using the kj algorithm with a radius Rgub/ accepting only the subjets 
that have pigub > /cutAhard/ where /cut is a dimensionless cutoff parameter, and Ahard is some 
hard QCD scale chosen to equal the px of the original jet. The R^^b arid /cut parameters of the 
algorithm are taken to be 0.2 and 0.03, respectively. As will be demonstrated, with this choice of 
parameters, trimming removes more jet constituents than the filtering procedure, but fewer jet 
constituents than pruning, and corresponds therefore to a moderately aggressive jet grooming 
technique. 

2.4 Pruning algorithm 

Following the clustering of jets using the original algorithm (either AK7, CAS, or CA12), the 
pruning algorithm ||26l l27| reclusters the constituents of the jet through the CA algorithm, us- 
ing the same distance parameter, but additional conditions beyond those given in Eq. (jsj. In 
particular, the softer of the two particles i and ; to be merged is removed when the following 
conditions are met: 

min(pT,v Pt,) 
' VTi + Ply 

ARij > Dcut = a ■ — , 
J Pt 

where nij and px are the mass and transverse momentum of the originally-clustered jet, and Zcut 
and a are parameters of the algorithm, chosen to be 0.1 and 0.5, respectively. In our particular 
choice of parameters, we have chosen to divide the jet into two "exclusive" subjets (similarly to 
the exclusive kj algorithm ||29|| , where one clusters constituents until the jets are all separated 
by the parameter R in Eq.|3|. As will be demonstrated, with this choice of parameters, pruning 
removes the largest number of jet constituents, and can therefore be regarded as the most ag- 
gressive jet grooming technique investigated. It was previously used in the CMS search for tt 
resonances | |34| . 



(5) 
(6) 



2.5 Groomed jet mass 

Figure [l] shows a comparison of distributions in the dijet sample for the ratio of groomed AK7 
jet mass to the mass of the matched ungroomed AK7 jet, for our three grooming techniques, for 
data and for PYTHIA6 MC simulation ||35|, using the Z2 tune. Three distributions are shown 
for each grooming technique: (i) the reconstructed data ("data RECO"), (ii) the reconstructed 
simulated PYTHIA6 data ("PYTHIA RECO"), and (iii) the generated particle-level jets from 
PYTHIA6 ("PYTHIA GEN"). These three grooming techniques involve different jet algorithms 
for grooming (CA for filtering and pruning, kj for trimming) once the jets are found with AK7. 
The data and the simulation exhibit similar behavior. In general, the filtering algorithm is 
the least aggressive grooming technique, with groomed jet masses close to the ungroomed 
values. The trimming algorithm is moderately aggressive, and the pruning algorithm is the 
most aggressive of the three. With pruning, a bimodal distribution begins to appear, which 
is typical of our implementation of this algorithm as we require clustering into two exclusive 
subjets. In cases where the pruned jet mass is small, jets usually have most of their energy 
configured in "core" components, with little gluon radiation, which leads to narrow jets. When 
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the pruned jet mass is large, the jets are spHt more symmetrically, which can be realized in 
events with gluons splitting into two nodes that fall within AR = 0.7 of the original parton. 
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Figure 1: Distributions in differential probability for ratios of the jet mass of groomed jets to 
their corresponding ungroomed values, for both dijet data and PYTHIA6 (tune Z2) MC simula- 
tion, for the three grooming techniques discussed in the text: (i) filtering (circles, peaking near 
0.9), (ii) trimming (squares, peaking near 0.75), and (iii) pruning (triangles, more dispersed). 



3 The CMS detector and simulation 

The CMS detector ||36| is a general-purpose device with many features suited for reconstruction 
of energetic jets, specifically, the finely segmented electromagnetic and hadronic calorimeters 
and charged-particle tracking detectors. 

CMS uses a right-handed coordinate system, with origin defined by the center of the CMS 
detector, the x axis pointing to the center of the LHC ring, the y axis pointing up, perpendicular 
to the plane of the LHC ring, and the z axis along the direction of the counterclockwise beam. 
The polar angle 6 is measured relative to the positive z axis and the azimuthal angle (p relative 
to the X axis in the x-y plane. 

Charged particles are reconstructed in the inner silicon tracker, which is immersed in a 3.8 T 
axial magnetic field. The CMS tracking detector consists of an inner silicon pixel detector com- 
posed of three concentric central layers and two sets of disks arranged forward and backward 
of the center, and up to ten silicon strip central layers and three inner and nine outer strip disks 
forward and backward of the center. This arrangement provides full azimuthal coverage for 
\rj\ < 2.5, where t] = — ln[tan(0/2)] is the pseudorapidity. The pseudorapidity approximates 
the rapidity y and equals y for massless particles. Since many of the reconstructed jets are not 
massless, we use the rapidity y for characterizing jets in this analysis. 
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A lead tungstate crystal electromagnetic calorimeter (ECAL) and a brass /scintillator hadronic 
calorimeter (HCAL) surround the tracking volume and provide photon, electron, and jet recon- 
struction up to \t]\ =3. The ECAL and HCAL cells are grouped into towers projecting radially 
outward from the center of the detector. In the central region (|//| < 1.74), the towers have di- 
mensions of Afj = A(p = 0.087 that increase at larger \ri\. ECAL and HCAL cell energies above 
some chosen noise-suppression thresholds are combined within each tower to define the tower 
energy. Muons are measured in gas-ionization detectors embedded in the steel return yoke out- 
side the solenoid. To improve reconstruction of jets, the tracking and calorimeter information 



is combined in a "particle-flow" (PF) algorithm ||37| , which is described in Section 4.4 



For the analysis of dijet events, samples are simulated with PYTHIA6.4 (Tune Z2) Il35l |38| , 
PYTHIA8 (Tune 4c) Ell, and HERWIG++ (Tune 23) HOI, and propagated through the simula- 
tion of the CMS detector based on Geant4 [4P|. Underlying event (UE) and pileup (PU) are 
included in the simulations, which are also reweighted to have the simulated PU distribution 
match the observed PU distribution in the data. 

For the V+jet analysis, events with a vector boson produced in association with jets are sim- 
ulated using MadGraph 5.1 ||42| . This matrix element generator is also used to simulate tt 
events. The MadGraph events are subsequently subjected to parton showering, simulated 
with PYTHIA6 using the Z2 Tune |38|. To compare hadronization in different generators, we 
generate V+jet samples in which parton showering and hadronization are simulated with HER- 
WIG++. Diboson (WW, WZ, and ZZ) events are also generated with PYTHIA6. Single-top-quark 
samples are produced with powheg 1*43], and the lepton enriched dijet samples are produced 
with PYTHIA6 using the Z2 Tune. CTEQ6L1 [44] is the default set of parton distribution func- 
tions used in all these samples, except for the single-top-quark MC, which uses CTEQ6M. 



4 Triggers and event reconstruction 

4.1 Dijet trigger selection 

Events are collected using single-jet triggers, which are based on jets reconstructed only from 
calorimetric information. This procedure yields inferior resolution to jets reconstructed offline 
with PF constituents, but provides faster reconstruction that meets trigger requirements. As the 
instantaneous luminosity is time-dependent, the specific jet-pj thresholds change with time. 
The triggers used to select dijet events have partial overlap. Those with lower-px thresholds 
have high prescale settings to accommodate the higher data-acquisition rates, and some events 
selected with these lower-px triggers are also collected at higher thresholds. 

To avoid double counting of phase space, each event is assigned to a specific trigger. To do 
this, we compute the trigger efficiency as a function of reconstructed Pj^^, select an interval in 
trigger efficiency where the efficiency is maximum (>95%) for that range of p^^, and assign 
that trigger to the appropriate Pj^^ interval. The assignment is based on the jet pj values 
reconstructed offline (but not groomed). Table [l] shows the pj thresholds for each of the jet 
triggers used in the analysis, and the corresponding intervals of px to which the triggered 
events are assigned. 

4.2 V+jet trigger selection 

Several triggers are also used to collect events corresponding to the topology of V+jet events, 
where the V decays via electrons or muons in the final state. For the W+jet channels, the triggers 
consist of several single-lepton triggers, with lepton identification criteria applied online. To 
assure an acceptable event rate, leptons are required to be isolated from other tracks and energy 



4.3 Binning jets as a function of pj 
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Table 1: Trigger pj thresholds for individual jets, and corresponding intervals used to 
assign the triggered events in the dijet analysis. 



Trigger pj threshold (GeV) 


pf^'^ range (GeV) 


190 


220-300 


240 


300-450 


370 


>450 



depositions in the calorimeters. For the W(f/v^) channel, the trigger thresholds for the muon 
Pt are in the range of 17 to 40 GeV. The higher thresholds are used at higher instantaneous 
luminosity. The combined trigger efficiency for signal events that pass offline requirements 
(described in Section|5) is ?»92%. 

For the W(ei/e) events, the electron pj threshold ranges from 25 to 65 GeV. To enhance the 
fraction of W+jet events in the data, the single-electron triggers are also required to have min- 
imum thresholds on the magnitude of the imbalance in transverse energy (£^'®®) and on the 
transverse mass (mj) of the (electron + E^^^) system, where rtij = 2E|£^'®®(1 — cos cp), and (p 
is the angle between the directions of p| and E3^^*. The combined efficiency for electron W+jet 
events that pass the offline criteria is ?»99%. 

The Z(/^|H) channel uses the same single-muon triggers as the W(/^(t/^,) channel. The Z(ee) 
channel uses dielectron triggers with lower thresholds for pj (17 and 8 GeV), and additional 
isolation requirements. These triggers are 99% efficient for all Z+jet events that pass the final 
offline selection criteria. 

4.3 Binning jets as a function of pj 

The jet pj bins introduced in Eq. ^ are given in Table |2] for V+jet and dijet events. The jet 
Pj is re-evaluated for each grooming algorithm. Because there are large biases due to jet mis- 
assignment in the dijet events, especially at small pj (when three particle-level jets are often 
reconstructed as two jets in the detector, or vice versa), the pj intervals for these events be- 
gin at 220 GeV. Furthermore, the smaller number of events in the V+jet samples precludes the 
study of these events beyond pj = 450 GeV. 

Table 2: Intervals in ungroomed jet pj for the V+jet and dijet analyses. 



Bin 


Pj interval (GeV) 


Analysis 


1 


125-150 


V+jet 


2 


150-220 


V+jet 


3 


220-300 


V+jet,dijet 


4 


300-450 


V+jet,dijet 


5 


450-500 


dijet 


6 


500-600 


dijet 


7 


600-800 


dijet 


8 


800-1000 


dijet 


9 


1000-1500 


dijet 



4.4 Event reconstruction 

As indicated above, events are reconstructed using the particle-flow algorithm, which com- 
bines the information from all subdetectors to reconstruct the particle candidates in an event. 
The algorithm categorizes particles into muons, electrons, photons, charged hadrons, and neu- 
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tral hadrons. The resulting PF candidates are passed through each jet clustering algorithm of 
Section|2| as implemented in FastJet (Version 3.0.1) 14511461 . 

The reconstructed interaction vertex characterized by the largest value of E/lPif^)^/ where 
pjf^ is the transverse momentum of the charged track associated with the vertex, is defined 
as the leading primary vertex (PV) of the event. This vertex is used as the reference vertex for 
all PF objects in the event. A pileup interaction can affect the reconstruction of jet momenta 
and E^^^^, as well as lepton isolation and b-tagging efficiency. To mitigate these effects, a track- 
based algorithm is used to remove all charged hadrons that are not consistent with originating 
from the leading PV. 

Electron reconstruction requires the matching of an energy cluster in the ECAL with a track ex- 
trapolated from the silicon tracker ||47|. Identification criteria based on the energy distribution 
of showers in the ECAL and consistency of tracks with the primary vertex are imposed on elec- 
tron candidates. Additional requirements remove any electrons produced through conversions 
of photons in detector material. The analysis considers electrons only in the range of |f/ 1 < 2.5, 
excluding the transition region 1.44 < \r]\ < 1.57 between the central and endcap ECAL detec- 
tors because of poorer resolution for electrons in this region. Muons are reconstructed using 
two algorithms [48]: (i) in which tracks in the silicon tracker are matched to signals in the muon 
chambers, and (ii) in which a global fit is performed to a track seeded by signals in the external 
muon system. The muon candidates are required to be reconstructed through both algorithms. 
Additional identification criteria are imposed on muon candidates to reduce the fraction of 
tracks misidentified as muons, and to reduce contamination from muon decays in flight. These 
criteria include the number of hits detected in the tracker and in the outer muon system, the 
quality of the fit to a muon track, and its consistency of originating from the leading PV. 

Charged leptons from V-boson decays are expected to be isolated from other energy deposi- 
tions in the event. For each lepton candidate, a cone with radius 0.3 for muons and 0.4 for 
electrons is chosen around the direction of the track at the event vertex. When the scalar sum 
of the transverse momenta of reconstructed particles within that cone, excluding the contribu- 
tion from the lepton candidate, exceeds ^10% of the px of the lepton candidate, that lepton is 
ignored. The exact isolation requirement depends on the r], pj, and flavor of the lepton. Muons 
and electrons are required to have pj > 30 GeV and > 80 GeV, respectively. The large thresh- 
old for electrons ensures good trigger efficiency. To avoid double counting, isolated charged 
leptons are removed from the list of PF objects that are clustered into jets. 

After removal of isolated leptons and charged hadrons from pileup vertices, only the neutral 
hadron component from pileup remains and is included in the jet clustering. This remaining 
component of pileup to the jet energy is removed by applying a correction based on a mean pj 
per unit area of (Ay x A^) originating from neutral particles [30. 49|. This quantity is computed 
using the kj algorithm, and corrects the jet energy by the amount of energy expected from 
pileup in the jet cone. This "active area" method adds a large number of soft "ghost" particles 
to the clustering sequence to determine the effective area subtended by each jet. This procedure 
is done for all grooming algorithms just as for the ungroomed jets. The active area of a groomed 
jet is smaller than that of an ungroomed jet, and the pileup correction is therefore also smaller. 
Different responses in the endcap and central barrel calorimeters necessitate using //-dependent 
jet corrections. The amount of energy expected from the remnants of the hard collision (the 
underlying event) is estimated from minimum-bias data and MC events, and is added back 
into the jet. 

In addition, the pileup-subtracted jet four-momenta in data are corrected for nonlinearities in 
t] and Pj by using a pj- and //-dependent correction to account for the difference between 
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the response in MC -Simula ted events and data l|50l . The jet corrections are derived for the 
ungroomed jet algorithms but are also applied to the groomed algorithms, thereby adding 
additional systematic uncertainty in the energy of groomed jets. 

5 Event selection 

We apply several other selection criteria to minimize instrumental background and electronic 
noise. In particular, accepted events must have at least one good primary vertex (Section |4.4| |. 
Backgrounds from additional beam interactions are reduced by applying a variety of require- 
ments on charged tracks. Finally, calorimeter noise is minimized through restrictions on timing 
and electronic pulse shapes expected for signals. 

Dijet events are required to have at least two AK7 jets, each with pj > 50 GeV and \y\ < 2.5, and 
each jet must satisfy the jet quality criteria discussed in Ref. Ii37ll . No third-jet veto is applied. 

Reconstruction of W and Z bosons in V+jet events begins with identification of charged leptons 
and a calculation of E^^^, described in the previous section. Candidates for Z — > = e 

or }i) decays are reconstructed by combining two isolated electrons or muons and requiring the 
dilepton invariant mass to be in the 80 < M^c < 100 GeV range. An accurate measurement 
of E^^^^ is essential for distinguishing the W signal from background processes. The £^'®® in 
the event is defined using the PF objects, and this analysis requires £^'®* > 50 GeV. Candidate 
W — > iv{ decays are identified primarily through the presence of a significant £^'*® and a 
single isolated lepton of large pj, with pj and nij of the W candidate obtained by combining 
the lepton and the £^^®^ vectors. 

The analysis of V+jet events is mainly of interest for the regime of pj > 120 GeV, in which the 
opposing jet tends to have large pj as well, because of momentum conservation. In fact, the 
leading jet in each event (independent of clustering algorithm and jet radius) is required to have 
Pj > 125 GeV and \y\ < 2.5. A back-to-back topology between the vector boson and the leading 
jet is ensured by the additional selection of A^(V, jet) > 2 and AR(^, jet) > 1. Requiring such 
highly boosted jets, in addition to the tight isolation criteria for the leptons, greatly suppresses 
the background from multijet production. In the W — ?► £v^+jet analysis, additional rejection 
of multijet background is achieved by requiring nij (W) > 50 GeV. No subleading-jet veto is 
applied. 

Figures |2ja) and (b) show the pj distributions for the leading AK7 jet selected in Z+jet and 
W+jet candidate events, respectively. Given the unique signature for highly-boosted vector 
bosons recoiling from jets, the selections suffice to define very pure samples of V+jet events. 
In the Z(££)+jet analysis, the additional constraint on dilepton mass removes almost all back- 
ground contributions, yielding a purity of ?»99% for Z+jet events, with ?»1% contamination 
from diboson production. The W+jet candidate sample contains «82% W+jet events, with 
small background contributions from tt (13%), single top-quark (3%), and diboson and Z+jet 
(1% each) events based on MC simulation. The small number of events expected from these 
processes are subtracted using MC predictions for the jet mass from the W+jet candidate events, 
before correcting the data for detector effects. Similarly, the small number of events expected 
from diboson production are subtracted from the Z +jet candidates. 

6 Influence of pileup on jet grooming algorithms 

During the data taking the instantaneous LHC luminosity exceeded ~3.0 x 10"^'^ cm^'^ s^^, or 
an average of ten interactions per bunch crossing. Such pileup collisions are not correlated with 
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the hard-scattering process that triggers an interesting event, but present a background from 
low-px interactions that can affect the measured energies of jets and their observed masses. 
Methods to mitigate these effects are part of standard event reconstruction, as discussed in 
Section [4^4} and are essential for extracting correct jet multiplicities and energies. The jet mass 
is expected to be particularly sensitive to pileup [IJ for jets of large angular extent that contain 
many particles. Grooming techniques are designed to reduce the effective area of such jets and 
thereby minimize sensitivity to pileup. We examine this issue through studies of jet mass in the 
presence of pileup. 

The mean jet mass {nij) for AK jets is presented for size parameters R = 0.5, 0.7, and 0.8, as a 
function of the total number of reconstructed primary vertices (Npv) in Fig. |3ja), for data and 
MC simulation. The mean mass for Npv = 1 increases linearly with the jet radius from 0.5 to 
0.8. A measure of the dependence of (mj) on pileup is given by the slope of a linear fit to the 
jet mass versus Npy. The ratios of these slopes (sr) are found to be roughly consistent with the 
ratio of the third power of the jet radius, as summarized in Table |3] 

Table 3: Slopes of linear fits of {nij) as a function of Npy for AK jets of different R values. 



Ratio of slopes Measured Expected 

S0.7/S0.5 2.7 ± 0.9 (stat.) (0.7/0.5)3 = 2.74 

S0.8/S0.5 3.3 ± 1.0 (stat.) (0.8/0.5)3=4.10 

S0.8/S0.7 1.2 ± 0.2 (stat.) (0.8/0.7)3 ^ 



This is in agreement with predictions for scaling of the mean mass [51 1 . The R3 depend ence can 
be understood in terms of the increase of the jet area as R^. Simultaneously, the contribution of 
these particles to the jet mass scales with the distance between them, or ~R/2, yielding another 
power of R. 

In Fig. |3|b) we show the dependence of (nij) on Npy, for AK7 jets, for different grooming 
algorithms. The grooming significantly reduces the impact of pileup on (mj), as reflected by 
the decrease of the slope of the linear fit to the groomed-jet data points, as summarized in 
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Figure 3: Distributions of the average jet mass for AK jets as a function of the number of recon- 
structed primary vertices: (a) for different jet radii, and (b) for AK7 jets, comparing the impact 
of grooming algorithms to results without grooming. 

Table 4: Values of slopes for the dependence of {nij) on Npv for AK jets with different radii and 
clustering algorithms. 



Jet R Clustering algorithm (GeV/PV) 



AK5 


ungroomed 


0.10 ± 0.03 (stat.) 


AK7 


ungroomed 


0.28 ± 0.03 (stat.) 


AK7 


filtered 


0.16 ± 0.02 (stat.) 


AK7 


trimmed 


0.12 ±0.04 (stat.) 


AK7 


pruned 


0.10 ± 0.05 (stat.) 


AK8 


ungroomed 


0.33 ± 0.03 (stat.) 



The observed agreement between data and simulation in Fig. |3] provides support for our char- 
acterization of jet grooming and pileup, and the decrease in slopes suggests that grooming is 
indeed an effective tool for suppressing the impact of pileup on jets with large R parameters. 



7 Corrections and systematic uncertainties 

Before comparison of the jet mass distributions with QCD predictions, the data are corrected to 
the particle level for detector effects, such as resolution and acceptance. The simulated particle- 
level jets are reconstructed with the same algorithm and with the same parameters as the PF 
jets. We use the unfolding procedure described in Refs. |'52'-'56J to correct the jet mass, through 
an iterative technique for finding the maximum-likelihood solution of the unfolding problem. 
The detector response matrix is obtained in MC studies of jets. In general, the number of 
iterations must be tuned to minimize the impact of statistical fluctuations on the result. In 
practice, however, the procedure is largely insensitive to the precise settings and binning of 
events and four iterations usually suffice. A larger number of iterations were found to provide 
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7 Corrections and systematic uncertainties 



the same results except for small fluctuations in the tails of distributions. A simpler bin-by-bin 
unfolding is used as a cross-check, and is found to provide similar results, with fluctuations in 
the tails of the distributions. The jet transverse momenta are not unfolded. 

Systematic uncertainties are estimated by modifying the response matrix for each source of 
uncertainty by ±1 standard deviation, and comparing the mass distribution to the nominal 
results, based on simulated PYTHIA6 events. The difference in the unfolded mass spectrum 
from such a change is taken as the uncertainty arising from that source. 

The experimental uncertainties that can affect the unfolding of the jet mass include the jet en- 
ergy scale (JES), jet energy resolution (JER), and jet angular resolution (JAR). The uncertainty 
from JES is estimated by raising and lowering the jet four-momenta by the measured uncer- 
tainty as a function of jet pj and rj [50 1, which typically corresponds to 1-2% for the jets in this 
analysis. Two additional pj- and f/-independent uncertainties are included: a 1% uncertainty 
to account for differences observed between the measured and predicted W mass for high-px 
jets in a tt-enriched sample, and a 3% uncertainty to account for differences in the groomed and 
ungroomed energy responses found in MC simulation Il34l . 

The impact of uncertainties in JER and JAR on nij are evaluated by smearing the jet energies, as 
well as the resolutions in rj and (p, each by 10% in the MC simulation relative to the particle-level 
generated jets [50] . These estimated uncertainties on JER and JAR are found to be essentially 
the same for all jet grooming techniques in MC studies. Since this analysis uses jets constructed 
from PF constituents, the charged particles have excellent energy and angular resolutions, but 
their use induces a dependence on tracking uncertainties, e.g., tracking efficiency. This depen- 
dence is accounted for implicitly in the ±10% changes in jet energy and angular resolutions, 
since such changes would lead to a difference between expected and observed values of these 
quantities. The same is true for the neutral electromagnetic component of the jet (primarily 
from — ?► 77 decays). 

The remaining sources of uncertainty are estimated from MC simulation. The tracking infor- 
mation is not sensitive to the neutral hadronic component of jets, and this small contribution 
is taken directly from simulation. We estimate this remaining uncertainty by comparing the 
unfolded data using PYTHIA6 and using HERWIG++, and assign the difference as a system- 
atic uncertainty. This also accounts for the uncertainty from modeling parton showers. The 
latter effect often comprises the largest uncertainty in the unfolded jet mass distributions as 
described below. Other theoretical ambiguities that can affect the unfolding of the jet mass 
include the variation of the parton distribution functions and the modeling of initial and final- 
state radiation (ISR/FSR). The former was investigated and found to be much smaller than 
the difference between the unfolding with PYTHIA6 and the unfolding with HERWIG++, and 
hence is neglected. The latter is included implicitly in the uncertainty between PYTHIA6 and 
HERWIG+ + . 



As described in Section 4.4 the jets used in this analysis are reconstructed after removing the 
charged hadrons that appear to emanate from subleading primary vertices. This procedure 
produces a dramatic (~60%) reduction in the pileup contribution to jets. The residual uncer- 
tainty from pileup is obtained through MC simulation, estimated by increasing and decreasing 
the cross section for minimum-bias events by 8%. 

In the dijet analysis, there can be incorrect assignments of leading reconstructed jets relative to 
the generator level, e.g., two generator-level jets can be matched to three reconstructed jets, or 
vice versa. This effect causes a bias in the unfolding procedure, which becomes greater at small 
Pj. This bias is corrected through MC studies of matching of particle-level jets to reconstructed 
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jets, and the magnitude of the bias correction is also added to the overall systematic uncertainty. 
Such misassignments are negligible in the V+jet analysis. 

8 Results from dijet final states 

The differential probability distributions of Eq. (j2| for of the two leading jets in dijet 

events, corrected for detector effects in the jet mass, are displayed in Figs. |4j|7| for seven bins 
in -p^*^ along with the HERWIG++ predictions.. The p^^^ is not corrected to the particle level, 
because the correction is expected to be negligible for the momenta considered. Results are 
shown for ungroomed jets and for the three categories of grooming. Each distribution is nor- 
malized to unity. The ratios of the MC simulations used in Figs. |4]-[7]to the results for data, for 
PYTHIA6, PYTHIA8, and for HERWIG++ are given in Figs.[8||TT respectively. 



The largest systematic uncertainty is from the choice of parton-shower modeling used to cal- 
culate detector corrections, with small, but still significant uncertainties arising from jet energy 
scale and resolution, and small contributions from jet angular resolution and pileup. In the 
220-300 GeV and 300-450 GeV jet-px bins, the nij < 50 GeV region is dominated by uncertain- 
ties from unfolding (50-100%), which are negligible for > 450 GeV. For nij > 50 GeV, 
the JES, JER, JAR, and pileup uncertainties each contribute ?alO%. For the 450-1000 GeV pj 
bins, parton showering dominates the uncertainties, which is around 50-100% below the peak 
of the nij distribution and 5-10% for the rest of the distribution. For pj > 1000 GeV, statistical 
uncertainty dominates the entire mass range. 



For clarity, the distributions in Figs. |&- 11 are truncated where few events are recorded. Bins 
in m^'^ with uncertainties of > 100% are ignored to avoid overlap with more precise mea- 
surements in other p^*^ bins. The agreement with HERWIG++ modeling of parton showers 
appears to be best for pj^^ > 300 GeV and m^'^ > 20 GeV. However, the ungroomed and 
filtered jets show worse agreement for 20 < < 50 GeV when p^^ > 450 GeV. For all 

generators and all p^^ bins, the agreement is better at larger jet masses. The disagreement is 
largest at the very lowest mass values, which correspond to the region most sensitive to the 
underlying event description and pileup, and where the amount of showering is apparently 
underestimated in the simulation. 

9 Results from V+jet final states 

This section provides the probability density distributions as functions of the mass of the lead- 
ing jet in V+jet events. These distributions are corrected for detector effects in the jet mass, and 
are compared to MC expectations from MadGraph (interfaced to PYTHIA6) and HERWIG++. 
The jet mass distributions are studied in different ranges of pj between 125 and 450 GeV, as 
given in Table |2] (Just as in the dijet results, pj is not corrected to the particle level.) For jets 
reconstructed with the CA algorithm (R = 1.2), we study only the events with pj > 150 GeV, 
which is most interesting for heavy particle searches in the highly-boosted regime, where all 
decay products are contained within R = 1.2 jets [|24|| . 



For clarity, the distributions are also truncated at large mass values where few events are 
recorded. Jet-mass bins with relative uncertainties > 100% are also ignored to minimize over- 
lap with more precise measurements in other pj bins. 



Figures [12-15 show mass distributions for the leading AK7 jet accompanying a Z boson in 



Z{££)+iet events for the ungroomed, filtered, trimmed, and pruned clustering of jets, respec- 
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Figure 4: Unfolded distributions for the mean mass of the two leading jets in dijet events for 
reconstructed AK7 jets, separated according to intervals in (the mean pj of the two jets). 
The data are shown by the symbols indicating different bins in the mean pj of the two jets. The 
statistical uncertainty is shown in light shading, and the total uncertainty in dark shading. Pre- 
dictions from HERWIG++ are given by the dotted lines. To enhance visibility, the distributions 
for larger values of p^^ are scaled up by the factors given in the legend. 
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Figure 5: Unfolded distributions for the mean mass of the two leading jets in dijet events for 
reconstructed filtered AK7 jets, separated according to intervals in p^^^ (the mean pj of the 
two jets). The data are shown by the symbols indicating different bins in the mean pj of the 
two jets. The statistical uncertainty is shown in light shading, and the total uncertainty in dark 
shading. Predictions from HERWIG++ are given by the dotted lines. To enhance visibility, the 
distributions for larger values of p^^ are scaled up by the factors given in the legend. 
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Figure 6: Unfolded distributions for the mean mass of the two leading jets in dijet events for 
reconstructed trimmed AK7 jets, separated according to intervals in pj^^ (the mean pj of the 
two jets). The data are shown by the symbols indicating different bins in the mean pj of the 
two jets. The statistical uncertainty is shown in light shading, and the total uncertainty in dark 
shading. Predictions from HERWIG++ are given by the dotted lines. To enhance visibility, the 
distributions for larger values of p^^ are scaled up by the factors given in the legend. 



17 



T3 



> 

O 

CD 

E 

T3 



10' 
10^ 



CMS, L = 5 fb"^ at Vs = 7 TeV, Pruned AK7 Dijets 



m~\ — I — I — I — [ — I — I — I — I — T — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — pi 

^ I I statistical Uncertainty | | Total Uncertainty HERWIG++, Tune 23 : 

I O 220-300GeV(x10°) ■ 300 - 450 GeV (x 10') □ 450 - 500 GeV (x 10^) A 500 - 600 GeV (x 1o')| 
s- A 600 -800 GeV (X 10*) T 800 - 1 000 GeV (x 1 0*) V 1000 - 1500 GeV (x 1o') — 



10* m 



10^ 



3^ 

10^ fci 

10^ 
10 

1 

10-^ 

10-2 



V '^■■V..V...g.- V 



10 '' r 
10-4 r 
10-^ i- 



10' 



""a-.iD..o„ci,.,Q,. 



ICS': 



J I I I I I I I I I I t,,,,^,,,j, 



J J L 



»J \ J L 



a t L 



50 



100 



150 



200 



250 



m 



,AVG 



300 

(GeV) 



Figure 7: Unfolded distributions for the mean mass of the two leading jets in dijet events for 
reconstructed pruned AK7 jets, separated according to intervals in (the mean pj of the 

two jets). The data are shown by the symbols indicating different bins in the mean pj of the 
two jets. The statistical uncertainty is shown in light shading, and the total uncertainty in dark 
shading. Predictions from HERWIG++ are given by the dotted lines. To enhance visibility, the 
distributions for larger values of p^^ are scaled up by the factors given in the legend. 
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Figure 8: Ratio of MC simulation to unfolded distributions of the jet mass for AK7 jets for 
the seven bins in pj^*^. The statistical uncertainty is shown in light shading, and the total 
uncertainty is shown in dark shading. The comparison for PYTHIA6 is shown in solid lines, for 
PYTHIAS in dashed lines, and for HERWIG++ in dotted lines. 
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Figure 9: Ratio of MC simulation to unfolded distributions of the jet mass for filtered AK7 jets 
for the seven bins in p^^. The statistical uncertainty is shown in light shading, and the total 
uncertainty is shown in dark shading. The comparison for PYTHIA6 is shown in solid lines, for 
PYTHIAS in dashed lines, and for HERWIG++ in dotted lines. 
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Figure 10: Ratio of MC simulation to unfolded distributions of the jet mass for trimmed AK7 
jets for the seven bins in pj^^. The statistical uncertainty is shown in light shading, and the 
total uncertainty is shown in dark shading. The comparison for PYTHIA6 is shown in solid 
lines, for PYTHIAS in dashed lines, and for HERWIG++ in dotted lines. 
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Figure 11: Ratio of MC simulation to unfolded distributions of the jet mass for pruned AK7 jets 
for the seven bins in p^^. The statistical uncertainty is shown in light shading, and the total 
uncertainty is shown in dark shading. The comparison for PYTHIA6 is shown in solid lines, for 
PYTHIAS in dashed lines, and for HERWIG++ in dotted lines. 
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tively. Both PYTHIA6 and HERWIG++ show good agreement with data for all px bins, but es- 
pecially so for pj > 300 GeV. As in the case of the dijet analysis, the data at small jet mass are 
not modeled satisfactorily, but show modest improvement after applying the grooming pro- 



cedures. To investigate several popular choices of jet grooming at CMS, Figs. 16 - 17 show the 
distributions in nij for pruned CAS and filtered CA12 jets in Z+jet events. For groomed CA 
jets, both PYTHIA6 and HERWIG++ provide good agreement with the data, with some possible 
inconsistency for nij < 20 GeV and at large nij for pj < 300 GeV for the ungroomed and filtered 



jets. Figures 18-21 show the corresponding distributions for the mass of the leading jet accom- 
panying the W boson for AK7 jets in W(£i/£)+jet events for the ungroomed, filtered, trimmed, 
and pruned clustering algorithms, and Figs. 22]- 23 show the distributions for pruned CAS and 
filtered CA12 jets. For CAS and CA12 jets, only particular grooming algorithms and px bins are 
chosen for illustration. The MC simulation shows good agreement with data, just as observed 
for Z+jet events. 
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Figure 12: Unfolded, ungroomed AK7 nij distribution for Z(££)+jet events. The data (black 
symbols) are compared to MC expectations from MadGraph+pythia6 (solid lines) and her- 
WIG++ (dotted lines) on the left. The ratio of MC to data is given on the right. The statistical 
uncertainty is shown in light shading, and the total uncertainty in dark shading. 
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Figure 13: Unfolded AK7 filtered mj distribution for Z(££)+jet events. The data (black symbols) 
are compared to MC expectations from MadGraph+pythia6 (solid lines) and HERWIG++ 
(dotted lines) on the left. The ratio of MC to data is given on the right. The statistical uncertainty 
is shown in light shading, and the total uncertainty in dark shading. 
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Figure 14: Unfolded AK7 trimmed mj distribution for Z(££)+jet events. The data (black sym- 
bols) are compared to MC expectations from MadGraph+pythia6 (solid lines) and her- 
WIG++ (dotted lines) on the left. The ratio of MC to data is given on the right. The statistical 
uncertainty is shown in light shading, and the total uncertainty in dark shading. 
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Figure 16: Unfolded CAS pruned mj distribution for Z(££)+jet events. The data (black symbols) 
are compared to MC expectations from MadGraph+pythia6 (solid lines) and HERWIG++ 
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Figure 17: Unfolded CA12 filtered nij distribution for Z(i'£)+jet events. The data (black sym- 
bols) are compared to MC expectations from MadGraph+PYTHIA6 (solid lines) and HER- 
WIG++ (dotted lines) on the left. The ratio of MC to data is given on the right. The statistical 
uncertainty is shown in light shading, and the total uncertainty in dark shading. 
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Figure 18: Distributions in nij for unfolded, ungroomed AK7 jets in W(£t/^ )+jet events. The data 
(black symbols) are compared to MC expectations from MadGraph+pythia6 (solid lines) 
and HERWIG++ (dotted lines) on the left. The ratios of MC to data are given on the right. The 
statistical uncertainty is shown in light shading, and the total uncertainty in dark shading. 




Figure 19: Distributions in nij for unfolded, filtered AK7 jets in W{£v()+iet events. The 
data (black symbols) for different bins in pj are compared to MC expectations from Mad- 
GRAPH+PYTHIA6 (solid lines) and HERWIG++ (dotted lines) on the left. The ratios of MC to 
data are given on the right. The statistical uncertainty is shown in light shading, and the total 
uncertainty in dark shading. 
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Figure 20: Distributions ir\ mj for unfolded, trimmed AK7 jets in W(£i/£)+jet events. The 
data (black symbols) for different bins in pj are compared to MC expectations from Mad- 
GRAPH+PYTHIA6 (solid lines) and HERWIG++ (dotted lines) on the left. The ratios of MC to 
data are given on the right. The statistical uncertainty is shown in light shading, and the total 
uncertainty in dark shading. 




Figure 21: Distributions in nij for unfolded, pruned AK7 jets in W(£v£)+jet events. The 
data (black symbols) for different bins in pj are compared to MC expectations from Mad- 
GRAPH+PYTHIA6 (solid lines) and HERWIG++ (dotted lines) on the left. The ratios of MC to 
data are given on the right. The statistical uncertainty is shown in light shading, and the total 
uncertainty in dark shading. 
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Figure 22: Distributions in nij for unfolded, pruned CAS jets in W(£i/^)+jet events. The 
data (black symbols) for different bins in pj are compared to MC expectations from Mad- 
GRAPH+PYTHIA6 (solid lines) and HERWIG++ (dotted lines) on the left. The ratios of MC to 
data are given on the right. The statistical uncertainty is shown in light shading, and the total 
uncertainty in dark shading. 
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Figure 23: Distributions in nij for unfolded, filtered CA12 jets in 'W{£v£)+iet events. The 
data (black symbols) for different bins in pj are compared to MC expectations from Mad- 
GRAPH+PYTHIA6 (solid lines) and HERWIG++ (dotted lines) on the left. The ratios of MC to 
data are given on the right. The statistical uncertainty is shown in light shading, and the total 
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10 Summary 

We have presented the differential distributions in jet mass for inclusive dijet and V+jet events, 
defined through the anti-A;x algorithm for a size parameter of 0.7 for ungroomed jets, as well 
as for jets groomed through filtering, trimming, and pruning. In addition, similar distributions 
for V+jet events were given for pruned Cambridge-Aachen jets with a size parameter of 0.8, as 
well as for filtered Cambridge- Aachen jets with a size parameter of 1.2. The impact of pileup 
on jet mass was also investigated. 

Higher-order QCD matrix-element predictions for partons, coupled to parton-shower Monte 
Carlo programs that generate jet mass in dijet and V+jet events, are found to be in good agree- 
ment with data. A comparison of data with MC simulation indicates that both PYTHIA6 and 
HERWIG++ reproduce the data reasonably well, and that the HERWIG++ predictions for more 
aggressive grooming algorithms, i.e., those that remove larger fractions of contributions to the 
original ungroomed jet mass, agree somewhat better with observations. It is also observed that 
the more aggressive grooming procedures lead to somewhat better agreement between data 
and MC simulation. 

In comparing the results from the V+jet analysis with those for the two leading jets in multijet 
events, the predictions provide slightly better agreement with the V+jet data. This observation 
suggests that simulation of quark jets is better than of gluon jets. Differences between data 
and simulation are larger at small jet mass values, which also correspond to the region more 
affected by pileup and soft QCD radiation. 

These studies represent the first detailed investigations of techniques for characterizing jet sub- 
structure based on data collected by the CMS experiment at a center-of-mass energy of 7TeV. 
For the trimming and pruning algorithms, these studies mark the first publication on this sub- 
ject from the LHC, and provide an important benchmark for their use in searches for massive 
particles. Finally, the intrinsic stability of these algorithms to pileup effects is likely to con- 
tribute to a more rapid and widespread use of these techniques in future high-limiinosity runs 
at the LHC. 
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